A Recovery Conscious Framework for Fault Resilient Storage Systems

نویسندگان

  • Sangeetha Seshadri
  • Ling Liu
  • Lawrence Chiu
  • Cornel Constantinescu
چکیده

This paper presents a recovery-conscious framework for improving the fault resiliency and recovery efficiency of highly concurrent embedded storage software systems. Our framework consists of a three-tier architecture and a suite of recovery conscious techniques. In the top tier, we promote the fine-grained recovery at the task level by introducing recovery scopes to model recovery dependencies between tasks. At the middle tier we develop highly effective groupings of recovery scopes into recovery groups based on system and workload characteristics. We study how to distribute recovery scopes between recovery groups and schedule recovery groups effectively in a multi-core storage system through a careful tuning of recovery-efficiency sensitive parameters. At the bottom tier, advocate the use of recovery-conscious scheduling instead of performance oriented scheduling to provide high recovery efficiency without sacrificing system performance. An important question to address in this tier is under which combinations of resource pools and recovery groups, the recovery-conscious scheduling outperforms the performance oriented scheduling. Our techniques have been implemented on a real industry-standard storage system. Experimental results show that the right choice of recovery-sensitive parameters is critical and our techniques are effective, non-intrusive and can significantly boost system resilience while delivering high performance under a variety of system configurations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhancing Storage System Availability on Multi-Core Architectures with Recovery-Conscious Scheduling

In this paper we develop a recovery conscious framework for multi-core architectures and a suite of techniques for improving the resiliency and recovery efficiency of highly concurrent embedded storage software systems. Our techniques aim at providing continuous availability and performance during recovery while minimizing the time to recovery and the need for rearchitecting the system (legacy ...

متن کامل

Resilient operation scheduling of microgrid using stochastic programming considering demand response and electrical vehicles

Resilient operation of microgrid is an important concept in modern power system. Its goal is to anticipate and limit the risks, and provide appropriate and continuous services under changing conditions. There are many factors that cause the operation mode of micogrid changes between island and grid-connected modes. On the other hand, nowadays, electric vehicles (EVs) are desirable energy storag...

متن کامل

Detailed Modeling and Novel Scheduling of Plug-in Electric Vehicle Energy Storage Systems for Energy Management of Multi-microgrids Considering the Probability of Fault Occurrence

As an effective means of displacing fossil fuel consumption and reducing greenhouse gas emissions, plug-in electric vehicles (PEVs) and plug-in hybrid electric vehicles (PHEVs) have attracted more and more attentions. From the power grid perspective, PHEVs and PEVs equipped with batteries can also be used as energy storage facilities, due to the fact that, these vehicles are parked most of the ...

متن کامل

Latecomer and Crash Recovery Support in Fault-Tolerant Groupware

Distributed collaboration systems must allow dynamic joining and leaving of sessions and therefore must support latecomers and crash recovery. We present two distributed algorithms for supporting latecomers and crash recovery and evaluate them within the DISCIPLE framework for collaboration. Both algorithms are generic, independent of the application semantics, and can work with arbitrary JavaB...

متن کامل

Resilient Virtualized Systems Using ReHype

System-level virtualization introduces critical vulnerabilities to failures of the software components that implement virtualization – the virtualization infrastructure (VI). To mitigate the impact of such failures, we introduce a resilient VI (RVI) that can recover individual VI components from failure, caused by hardware or software faults, transparently to the hosted virtual machines (VMs). ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007